Vectorized and performance‐portable quicksort

نویسندگان

چکیده

Recent works showed that implementations of quicksort using vector CPU instructions can outperform the non-vectorized algorithms in widespread use. However, these are typically single-threaded, implemented for a particular instruction set, and restricted to small set key types. We lift three restrictions: our proposed vqsort algorithm integrates into state-of-the-art parallel sorter i p s 4 o $$ ip{s}^4o , with geometric mean speedup 1.59. The same implementation on seven sets (including SVE RISC-V V) across four platforms. It also supports floating-point 16–128 bit integer keys. To best knowledge, this is fastest sort large arrays non-tuple keys CPUs, up 20 times as fast sorting standard libraries. This article focuses practical engineering aspects enabling speed portability, which we have not yet seen demonstrated implementation. Furthermore, introduce compact transpose-free networks in-register arrays, vector-friendly pivot sampling strategy robust against adversarial input.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Novel Hybrid Quicksort Algorithm Vectorized using AVX-512 on Intel Skylake

The modern CPU’s design, which is composed of hierarchical memory and SIMD/vectorization capability, governs the potential for algorithms to be transformed into efficient implementations. The release of the AVX-512 changed things radically, and motivated us to search for an efficient sorting algorithm that can take advantage of it. In this paper, we describe the best strategy we have found, whi...

متن کامل

Quicksort Revisited - Verifying Alternative Versions of Quicksort

We verify the correctness of a recursive version of Tony Hoare’s quicksort algorithm using the Hoare-logic based verification tool Dafny. We then develop a non-standard, iterative version which is based on a stack of pivot-locations rather than the standard stack of ranges. We outline an incomplete Dafny proof for the latter.

متن کامل

Quicksort asymptotics

The number of comparisons Xn used by Quicksort to sort an array of n distinct numbers has mean μn of order n log n and standard deviation of order n. Using different methods, Régnier and Rösler each showed that the normalized variate Yn := (Xn−μn)/n converges in distribution, say to Y ; the distribution of Y can be characterized as the unique fixed point with zero mean of a certain distribution...

متن کامل

Vectorized Cluster Search *

Contrary to conventional wisdom, the construction of clusters on a lattice can easily be vectorized, namely over each “generation” in a breadth first search. This applies directly to e.g. the single cluster variant of the Swendsen-Wang algorithm. On a cray-ymp, total CPU time was reduced by a factor 3.5 – 7 in actual applications. ∗Submitted to Computer Physics Communications

متن کامل

Resilient Quicksort and Selection

We consider the problem of sorting a sequence of n keys in a RAM-like environment where memory faults are possible. An algorithm is said to be δ-resilient if it can tolerate up to δ memory faults during its execution. A resilient sorting algorithm must produce a sequence where every pair of uncorrupted keys is ordered correctly. Finocchi, Grandoni, and Italiano devised a δ-resilient determinist...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Software - Practice and Experience

سال: 2022

ISSN: ['0038-0644', '1097-024X']

DOI: https://doi.org/10.1002/spe.3142